{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Data Reshaping\n",
"\n",
"Antes de analizar los datos, se necesita formar los datos obtenidos en un formato regular y que sea procesable por el algoritumo que luego utilizaremos. Es necesario asegurar que todos los datos correspondan con las variables. También, es necesario lidiar con los valores nulos, si es que hubiese. En términos generales, se puede decir que Data Reshaping es cambiar la manera en que los datos están organizados en coumnas y filas."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"1. [Join](#1)\n",
"2. [Union](#2)\n",
"3. [Stack, Unstack](#3)\n",
"4. [Pivot](#4)\n",
"5. [Melt](#5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Join\n",
"\n",
"\n",
"\n",
"Join, o merge, es el proceso de unir dos DataFrame diferentes en uno solo. Por ejemplo, si tenemos dos DataFrames que contienen diferente información pero sobre los mismos clientes, podemos unirlos en uno solo.\n",
"\n",
"Aquí algunos enlaces extras:\n",
"\n",
"__[pandas.merge](https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.merge.html)__\n",
"\n",
"__[Pandas : How to Merge Dataframes using Dataframe.merge() in Python – Part 1](https://thispointer.com/pandas-how-to-merge-dataframes-using-dataframe-merge-in-python-part-1/)__\n",
"\n",
"__[Pandas : How to Merge Dataframes using Dataframe.merge() in Python – Part 2](https://thispointer.com/pandas-merge-dataframes-on-specific-columns-or-on-index-in-python-part-2/)__\n",
"\n",
"__[Pandas : How to Merge Dataframes using Dataframe.merge() in Python – Part 3](https://thispointer.com/pandas-how-to-merge-dataframes-by-index-using-dataframe-merge-part-3/)__"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import os"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" CustomerID | \n",
" Gender | \n",
" Age | \n",
" Annual_Income | \n",
"
\n",
" \n",
" \n",
" \n",
" 146 | \n",
" 147 | \n",
" Male | \n",
" 48 | \n",
" 77 | \n",
"
\n",
" \n",
" 147 | \n",
" 148 | \n",
" Female | \n",
" 32 | \n",
" 77 | \n",
"
\n",
" \n",
" 148 | \n",
" 149 | \n",
" Female | \n",
" 34 | \n",
" 78 | \n",
"
\n",
" \n",
" 149 | \n",
" 150 | \n",
" Male | \n",
" 34 | \n",
" 78 | \n",
"
\n",
" \n",
" 150 | \n",
" 151 | \n",
" Female | \n",
" 48 | \n",
" 80 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" CustomerID Gender Age Annual_Income\n",
"146 147 Male 48 77\n",
"147 148 Female 32 77\n",
"148 149 Female 34 78\n",
"149 150 Male 34 78\n",
"150 151 Female 48 80"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mall_customers_info = pd.read_csv(os.path.join(\"csv\", \"mall_customers_info.csv\"))\n",
"mall_customers_info.tail()"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(151, 4)"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mall_customers_info.shape"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" CustomerID | \n",
" Spending Score (1-100) | \n",
"
\n",
" \n",
" \n",
" \n",
" 146 | \n",
" 147 | \n",
" 36 | \n",
"
\n",
" \n",
" 147 | \n",
" 148 | \n",
" 74 | \n",
"
\n",
" \n",
" 148 | \n",
" 149 | \n",
" 22 | \n",
"
\n",
" \n",
" 149 | \n",
" 150 | \n",
" 90 | \n",
"
\n",
" \n",
" 150 | \n",
" 152 | \n",
" 95 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" CustomerID Spending Score (1-100)\n",
"146 147 36\n",
"147 148 74\n",
"148 149 22\n",
"149 150 90\n",
"150 152 95"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mall_customers_score = pd.read_csv(os.path.join(\"csv\", \"mall_customers_score.csv\"))\n",
"mall_customers_score.tail()"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(151, 2)"
]
},
"execution_count": 64,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mall_customers_score.shape"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" CustomerID | \n",
" Gender | \n",
" Age | \n",
" Annual_Income | \n",
" Spending Score (1-100) | \n",
"
\n",
" \n",
" \n",
" \n",
" 145 | \n",
" 146 | \n",
" Male | \n",
" 28 | \n",
" 77 | \n",
" 97 | \n",
"
\n",
" \n",
" 146 | \n",
" 147 | \n",
" Male | \n",
" 48 | \n",
" 77 | \n",
" 36 | \n",
"
\n",
" \n",
" 147 | \n",
" 148 | \n",
" Female | \n",
" 32 | \n",
" 77 | \n",
" 74 | \n",
"
\n",
" \n",
" 148 | \n",
" 149 | \n",
" Female | \n",
" 34 | \n",
" 78 | \n",
" 22 | \n",
"
\n",
" \n",
" 149 | \n",
" 150 | \n",
" Male | \n",
" 34 | \n",
" 78 | \n",
" 90 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" CustomerID Gender Age Annual_Income Spending Score (1-100)\n",
"145 146 Male 28 77 97\n",
"146 147 Male 48 77 36\n",
"147 148 Female 32 77 74\n",
"148 149 Female 34 78 22\n",
"149 150 Male 34 78 90"
]
},
"execution_count": 63,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#customer_data = pd.merge(mall_customers_info[['CustomerID','Gender','Annual_Income']], mall_customers_score, on='CustomerID')\n",
"customer_data = pd.merge(mall_customers_info,mall_customers_score,on='CustomerID')\n",
"customer_data.tail()\n"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(150, 5)"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"customer_data.shape"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" CustomerID | \n",
" Gender | \n",
" Age | \n",
" Annual_Income | \n",
" Spending Score (1-100) | \n",
"
\n",
" \n",
" \n",
" \n",
" 146 | \n",
" 147 | \n",
" Male | \n",
" 48 | \n",
" 77 | \n",
" 36.0 | \n",
"
\n",
" \n",
" 147 | \n",
" 148 | \n",
" Female | \n",
" 32 | \n",
" 77 | \n",
" 74.0 | \n",
"
\n",
" \n",
" 148 | \n",
" 149 | \n",
" Female | \n",
" 34 | \n",
" 78 | \n",
" 22.0 | \n",
"
\n",
" \n",
" 149 | \n",
" 150 | \n",
" Male | \n",
" 34 | \n",
" 78 | \n",
" 90.0 | \n",
"
\n",
" \n",
" 150 | \n",
" 151 | \n",
" Female | \n",
" 48 | \n",
" 80 | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" CustomerID Gender Age Annual_Income Spending Score (1-100)\n",
"146 147 Male 48 77 36.0\n",
"147 148 Female 32 77 74.0\n",
"148 149 Female 34 78 22.0\n",
"149 150 Male 34 78 90.0\n",
"150 151 Female 48 80 NaN"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"customer_data = pd.merge(mall_customers_info,mall_customers_score,on='CustomerID', how='left')\n",
"customer_data.tail()"
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" CustomerID | \n",
" Gender | \n",
" Age | \n",
" Annual_Income | \n",
" Spending Score (1-100) | \n",
"
\n",
" \n",
" \n",
" \n",
" 146 | \n",
" 147 | \n",
" Male | \n",
" 48.0 | \n",
" 77.0 | \n",
" 36 | \n",
"
\n",
" \n",
" 147 | \n",
" 148 | \n",
" Female | \n",
" 32.0 | \n",
" 77.0 | \n",
" 74 | \n",
"
\n",
" \n",
" 148 | \n",
" 149 | \n",
" Female | \n",
" 34.0 | \n",
" 78.0 | \n",
" 22 | \n",
"
\n",
" \n",
" 149 | \n",
" 150 | \n",
" Male | \n",
" 34.0 | \n",
" 78.0 | \n",
" 90 | \n",
"
\n",
" \n",
" 150 | \n",
" 152 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" 95 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" CustomerID Gender Age Annual_Income Spending Score (1-100)\n",
"146 147 Male 48.0 77.0 36\n",
"147 148 Female 32.0 77.0 74\n",
"148 149 Female 34.0 78.0 22\n",
"149 150 Male 34.0 78.0 90\n",
"150 152 NaN NaN NaN 95"
]
},
"execution_count": 65,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"customer_data = pd.merge(mall_customers_info,mall_customers_score,on='CustomerID', how='right')\n",
"customer_data.tail()"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" CustomerID | \n",
" Gender | \n",
" Age | \n",
" Annual_Income | \n",
" Spending Score (1-100) | \n",
"
\n",
" \n",
" \n",
" \n",
" 147 | \n",
" 148 | \n",
" Female | \n",
" 32.0 | \n",
" 77.0 | \n",
" 74.0 | \n",
"
\n",
" \n",
" 148 | \n",
" 149 | \n",
" Female | \n",
" 34.0 | \n",
" 78.0 | \n",
" 22.0 | \n",
"
\n",
" \n",
" 149 | \n",
" 150 | \n",
" Male | \n",
" 34.0 | \n",
" 78.0 | \n",
" 90.0 | \n",
"
\n",
" \n",
" 150 | \n",
" 151 | \n",
" Female | \n",
" 48.0 | \n",
" 80.0 | \n",
" NaN | \n",
"
\n",
" \n",
" 151 | \n",
" 152 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" 95.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" CustomerID Gender Age Annual_Income Spending Score (1-100)\n",
"147 148 Female 32.0 77.0 74.0\n",
"148 149 Female 34.0 78.0 22.0\n",
"149 150 Male 34.0 78.0 90.0\n",
"150 151 Female 48.0 80.0 NaN\n",
"151 152 NaN NaN NaN 95.0"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"customer_data = pd.merge(mall_customers_info,mall_customers_score,on='CustomerID', how='outer')\n",
"customer_data.tail()"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" CustomerID | \n",
" Gender | \n",
" Age | \n",
" Annual_Income | \n",
" Spending Score (1-100) | \n",
"
\n",
" \n",
" \n",
" \n",
" 145 | \n",
" 146 | \n",
" Male | \n",
" 28 | \n",
" 77 | \n",
" 97 | \n",
"
\n",
" \n",
" 146 | \n",
" 147 | \n",
" Male | \n",
" 48 | \n",
" 77 | \n",
" 36 | \n",
"
\n",
" \n",
" 147 | \n",
" 148 | \n",
" Female | \n",
" 32 | \n",
" 77 | \n",
" 74 | \n",
"
\n",
" \n",
" 148 | \n",
" 149 | \n",
" Female | \n",
" 34 | \n",
" 78 | \n",
" 22 | \n",
"
\n",
" \n",
" 149 | \n",
" 150 | \n",
" Male | \n",
" 34 | \n",
" 78 | \n",
" 90 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" CustomerID Gender Age Annual_Income Spending Score (1-100)\n",
"145 146 Male 28 77 97\n",
"146 147 Male 48 77 36\n",
"147 148 Female 32 77 74\n",
"148 149 Female 34 78 22\n",
"149 150 Male 34 78 90"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"customer_data = pd.merge(mall_customers_info,mall_customers_score,on='CustomerID', how='inner')\n",
"customer_data.tail()"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(150, 5)"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"customer_data.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Union\n",
"\n",
"\n",
"\n",
"__[pandas.concat](https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.concat.html)__"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" CustomerID | \n",
" Gender | \n",
" Age | \n",
" Annual_Income | \n",
" Spending Score (1-100) | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 151 | \n",
" Male | \n",
" 43 | \n",
" 78 | \n",
" 17 | \n",
"
\n",
" \n",
" 1 | \n",
" 152 | \n",
" Male | \n",
" 39 | \n",
" 78 | \n",
" 88 | \n",
"
\n",
" \n",
" 2 | \n",
" 153 | \n",
" Female | \n",
" 44 | \n",
" 78 | \n",
" 20 | \n",
"
\n",
" \n",
" 3 | \n",
" 154 | \n",
" Female | \n",
" 38 | \n",
" 78 | \n",
" 76 | \n",
"
\n",
" \n",
" 4 | \n",
" 155 | \n",
" Female | \n",
" 47 | \n",
" 78 | \n",
" 16 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" CustomerID Gender Age Annual_Income Spending Score (1-100)\n",
"0 151 Male 43 78 17\n",
"1 152 Male 39 78 88\n",
"2 153 Female 44 78 20\n",
"3 154 Female 38 78 76\n",
"4 155 Female 47 78 16"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mall_customers_more = pd.read_csv(os.path.join(\"csv\", \"customers_data_2.csv\"))\n",
"mall_customers_more.head()"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(50, 5)"
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mall_customers_more.shape"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" CustomerID | \n",
" Gender | \n",
" Age | \n",
" Annual_Income | \n",
" Spending Score (1-100) | \n",
"
\n",
" \n",
" \n",
" \n",
" 40 | \n",
" 191 | \n",
" Female | \n",
" 34 | \n",
" 103 | \n",
" 23 | \n",
"
\n",
" \n",
" 41 | \n",
" 192 | \n",
" Female | \n",
" 32 | \n",
" 103 | \n",
" 69 | \n",
"
\n",
" \n",
" 42 | \n",
" 193 | \n",
" Male | \n",
" 33 | \n",
" 113 | \n",
" 8 | \n",
"
\n",
" \n",
" 43 | \n",
" 194 | \n",
" Female | \n",
" 38 | \n",
" 113 | \n",
" 91 | \n",
"
\n",
" \n",
" 44 | \n",
" 195 | \n",
" Female | \n",
" 47 | \n",
" 120 | \n",
" 16 | \n",
"
\n",
" \n",
" 45 | \n",
" 196 | \n",
" Female | \n",
" 35 | \n",
" 120 | \n",
" 79 | \n",
"
\n",
" \n",
" 46 | \n",
" 197 | \n",
" Female | \n",
" 45 | \n",
" 115 | \n",
" 28 | \n",
"
\n",
" \n",
" 47 | \n",
" 198 | \n",
" Male | \n",
" 32 | \n",
" 115 | \n",
" 74 | \n",
"
\n",
" \n",
" 48 | \n",
" 199 | \n",
" Male | \n",
" 32 | \n",
" 170 | \n",
" 70 | \n",
"
\n",
" \n",
" 49 | \n",
" 200 | \n",
" Male | \n",
" 30 | \n",
" 165 | \n",
" 73 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" CustomerID Gender Age Annual_Income Spending Score (1-100)\n",
"40 191 Female 34 103 23\n",
"41 192 Female 32 103 69\n",
"42 193 Male 33 113 8\n",
"43 194 Female 38 113 91\n",
"44 195 Female 47 120 16\n",
"45 196 Female 35 120 79\n",
"46 197 Female 45 115 28\n",
"47 198 Male 32 115 74\n",
"48 199 Male 32 170 70\n",
"49 200 Male 30 165 73"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"customers_data_all = pd.concat([customer_data, mall_customers_more])\n",
"#customers_data_all.sample(10)\n",
"customers_data_all.tail(10)"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(200, 5)"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"customers_data_all.shape"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [],
"source": [
"customers_data_all.reset_index(inplace=True, drop=True)"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" CustomerID | \n",
" Gender | \n",
" Age | \n",
" Annual_Income | \n",
" Spending Score (1-100) | \n",
"
\n",
" \n",
" \n",
" \n",
" 190 | \n",
" 191 | \n",
" Female | \n",
" 34 | \n",
" 103 | \n",
" 23 | \n",
"
\n",
" \n",
" 191 | \n",
" 192 | \n",
" Female | \n",
" 32 | \n",
" 103 | \n",
" 69 | \n",
"
\n",
" \n",
" 192 | \n",
" 193 | \n",
" Male | \n",
" 33 | \n",
" 113 | \n",
" 8 | \n",
"
\n",
" \n",
" 193 | \n",
" 194 | \n",
" Female | \n",
" 38 | \n",
" 113 | \n",
" 91 | \n",
"
\n",
" \n",
" 194 | \n",
" 195 | \n",
" Female | \n",
" 47 | \n",
" 120 | \n",
" 16 | \n",
"
\n",
" \n",
" 195 | \n",
" 196 | \n",
" Female | \n",
" 35 | \n",
" 120 | \n",
" 79 | \n",
"
\n",
" \n",
" 196 | \n",
" 197 | \n",
" Female | \n",
" 45 | \n",
" 115 | \n",
" 28 | \n",
"
\n",
" \n",
" 197 | \n",
" 198 | \n",
" Male | \n",
" 32 | \n",
" 115 | \n",
" 74 | \n",
"
\n",
" \n",
" 198 | \n",
" 199 | \n",
" Male | \n",
" 32 | \n",
" 170 | \n",
" 70 | \n",
"
\n",
" \n",
" 199 | \n",
" 200 | \n",
" Male | \n",
" 30 | \n",
" 165 | \n",
" 73 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" CustomerID Gender Age Annual_Income Spending Score (1-100)\n",
"190 191 Female 34 103 23\n",
"191 192 Female 32 103 69\n",
"192 193 Male 33 113 8\n",
"193 194 Female 38 113 91\n",
"194 195 Female 47 120 16\n",
"195 196 Female 35 120 79\n",
"196 197 Female 45 115 28\n",
"197 198 Male 32 115 74\n",
"198 199 Male 32 170 70\n",
"199 200 Male 30 165 73"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#customers_data_all.sample(10)\n",
"customers_data_all.tail(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Stack, Unstack\n",
"\n",
"\n",
"\n",
"__[Reshape using Stack() and unstack() function in Pandas python](http://www.datasciencemadesimple.com/reshape-using-stack-unstack-function-pandas-python/)__"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" YYYY | \n",
" JAN | \n",
" FEB | \n",
" MAR | \n",
" APR | \n",
" MAY | \n",
" JUN | \n",
" JUL | \n",
" AUG | \n",
" SEP | \n",
" OCT | \n",
" NOV | \n",
" DEC | \n",
" YEAR | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2008 | \n",
" 10140 | \n",
" 10239 | \n",
" 10050 | \n",
" 10111 | \n",
" 10159 | \n",
" 10159 | \n",
" 10141 | \n",
" 10117 | \n",
" 10178 | \n",
" 10148 | \n",
" 10125 | \n",
" 10182 | \n",
" 10146 | \n",
"
\n",
" \n",
" 1 | \n",
" 2009 | \n",
" 10137 | \n",
" 10140 | \n",
" 10140 | \n",
" 10141 | \n",
" 10188 | \n",
" 10168 | \n",
" 10128 | \n",
" 10165 | \n",
" 10208 | \n",
" 10166 | \n",
" 10041 | \n",
" 10068 | \n",
" 10141 | \n",
"
\n",
" \n",
" 2 | \n",
" 2010 | \n",
" 10151 | \n",
" 10034 | \n",
" 10168 | \n",
" 10194 | \n",
" 10158 | \n",
" 10166 | \n",
" 10158 | \n",
" 10129 | \n",
" 10147 | \n",
" 10135 | \n",
" 10057 | \n",
" 10133 | \n",
" 10136 | \n",
"
\n",
" \n",
" 3 | \n",
" 2011 | \n",
" 10182 | \n",
" 10161 | \n",
" 10227 | \n",
" 10192 | \n",
" 10182 | \n",
" 10154 | \n",
" 10123 | \n",
" 10130 | \n",
" 10149 | \n",
" 10182 | \n",
" 10194 | \n",
" 10099 | \n",
" 10165 | \n",
"
\n",
" \n",
" 4 | \n",
" 2012 | \n",
" 10194 | \n",
" 10286 | \n",
" 10271 | \n",
" 10053 | \n",
" 10159 | \n",
" 10127 | \n",
" 10139 | \n",
" 10155 | \n",
" 10149 | \n",
" 10109 | \n",
" 10108 | \n",
" 10085 | \n",
" 10153 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" YYYY JAN FEB MAR APR MAY JUN JUL AUG SEP OCT \\\n",
"0 2008 10140 10239 10050 10111 10159 10159 10141 10117 10178 10148 \n",
"1 2009 10137 10140 10140 10141 10188 10168 10128 10165 10208 10166 \n",
"2 2010 10151 10034 10168 10194 10158 10166 10158 10129 10147 10135 \n",
"3 2011 10182 10161 10227 10192 10182 10154 10123 10130 10149 10182 \n",
"4 2012 10194 10286 10271 10053 10159 10127 10139 10155 10149 10109 \n",
"\n",
" NOV DEC YEAR \n",
"0 10125 10182 10146 \n",
"1 10041 10068 10141 \n",
"2 10057 10133 10136 \n",
"3 10194 10099 10165 \n",
"4 10108 10085 10153 "
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"datos_mensuales = pd.read_csv('./csv/monthly_data.csv')\n",
"datos_mensuales.head(5)"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" JAN | \n",
" FEB | \n",
" MAR | \n",
" APR | \n",
" MAY | \n",
" JUN | \n",
" JUL | \n",
" AUG | \n",
" SEP | \n",
" OCT | \n",
" NOV | \n",
" DEC | \n",
" YEAR | \n",
"
\n",
" \n",
" YYYY | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 2008 | \n",
" 10140 | \n",
" 10239 | \n",
" 10050 | \n",
" 10111 | \n",
" 10159 | \n",
" 10159 | \n",
" 10141 | \n",
" 10117 | \n",
" 10178 | \n",
" 10148 | \n",
" 10125 | \n",
" 10182 | \n",
" 10146 | \n",
"
\n",
" \n",
" 2009 | \n",
" 10137 | \n",
" 10140 | \n",
" 10140 | \n",
" 10141 | \n",
" 10188 | \n",
" 10168 | \n",
" 10128 | \n",
" 10165 | \n",
" 10208 | \n",
" 10166 | \n",
" 10041 | \n",
" 10068 | \n",
" 10141 | \n",
"
\n",
" \n",
" 2010 | \n",
" 10151 | \n",
" 10034 | \n",
" 10168 | \n",
" 10194 | \n",
" 10158 | \n",
" 10166 | \n",
" 10158 | \n",
" 10129 | \n",
" 10147 | \n",
" 10135 | \n",
" 10057 | \n",
" 10133 | \n",
" 10136 | \n",
"
\n",
" \n",
" 2011 | \n",
" 10182 | \n",
" 10161 | \n",
" 10227 | \n",
" 10192 | \n",
" 10182 | \n",
" 10154 | \n",
" 10123 | \n",
" 10130 | \n",
" 10149 | \n",
" 10182 | \n",
" 10194 | \n",
" 10099 | \n",
" 10165 | \n",
"
\n",
" \n",
" 2012 | \n",
" 10194 | \n",
" 10286 | \n",
" 10271 | \n",
" 10053 | \n",
" 10159 | \n",
" 10127 | \n",
" 10139 | \n",
" 10155 | \n",
" 10149 | \n",
" 10109 | \n",
" 10108 | \n",
" 10085 | \n",
" 10153 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" JAN FEB MAR APR MAY JUN JUL AUG SEP OCT \\\n",
"YYYY \n",
"2008 10140 10239 10050 10111 10159 10159 10141 10117 10178 10148 \n",
"2009 10137 10140 10140 10141 10188 10168 10128 10165 10208 10166 \n",
"2010 10151 10034 10168 10194 10158 10166 10158 10129 10147 10135 \n",
"2011 10182 10161 10227 10192 10182 10154 10123 10130 10149 10182 \n",
"2012 10194 10286 10271 10053 10159 10127 10139 10155 10149 10109 \n",
"\n",
" NOV DEC YEAR \n",
"YYYY \n",
"2008 10125 10182 10146 \n",
"2009 10041 10068 10141 \n",
"2010 10057 10133 10136 \n",
"2011 10194 10099 10165 \n",
"2012 10108 10085 10153 "
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Preparacion: usar 'YYYY' como el ID/Indice\n",
"datos_mensuales.set_index('YYYY', inplace=True)\n",
"datos_mensuales.head(5)"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"YYYY \n",
"2008 JAN 10140\n",
" FEB 10239\n",
" MAR 10050\n",
" APR 10111\n",
" MAY 10159\n",
" ... \n",
"2017 SEP 10135\n",
" OCT 10176\n",
" NOV 10141\n",
" DEC 10120\n",
" YEAR 10160\n",
"Length: 130, dtype: int64"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#En valor en cada columna se transforma en una fila\n",
"datos_mensuales.stack()"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"RangeIndex: 11538 entries, 0 to 11537\n",
"Data columns (total 12 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 id 11538 non-null int64 \n",
" 1 name 11538 non-null object \n",
" 2 nationality 11538 non-null object \n",
" 3 sex 11538 non-null object \n",
" 4 date_of_birth 11538 non-null object \n",
" 5 height 11208 non-null float64\n",
" 6 weight 10879 non-null float64\n",
" 7 sport 11538 non-null object \n",
" 8 gold 11538 non-null int64 \n",
" 9 silver 11538 non-null int64 \n",
" 10 bronze 11538 non-null int64 \n",
" 11 info 131 non-null object \n",
"dtypes: float64(2), int64(4), object(6)\n",
"memory usage: 1.1+ MB\n"
]
}
],
"source": [
"athletes = pd.read_csv('./csv/athletes.csv')\n",
"athletes.info()"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" id | \n",
" name | \n",
" nationality | \n",
" sex | \n",
" date_of_birth | \n",
" height | \n",
" weight | \n",
" sport | \n",
" gold | \n",
" silver | \n",
" bronze | \n",
" info | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 736041664 | \n",
" A Jesus Garcia | \n",
" ESP | \n",
" male | \n",
" 1969-10-17 | \n",
" 1.72 | \n",
" 64.0 | \n",
" athletics | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" NaN | \n",
"
\n",
" \n",
" 1 | \n",
" 532037425 | \n",
" A Lam Shin | \n",
" KOR | \n",
" female | \n",
" 1986-09-23 | \n",
" 1.68 | \n",
" 56.0 | \n",
" fencing | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" NaN | \n",
"
\n",
" \n",
" 2 | \n",
" 435962603 | \n",
" Aaron Brown | \n",
" CAN | \n",
" male | \n",
" 1992-05-27 | \n",
" 1.98 | \n",
" 79.0 | \n",
" athletics | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" NaN | \n",
"
\n",
" \n",
" 3 | \n",
" 521041435 | \n",
" Aaron Cook | \n",
" MDA | \n",
" male | \n",
" 1991-01-02 | \n",
" 1.83 | \n",
" 80.0 | \n",
" taekwondo | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" NaN | \n",
"
\n",
" \n",
" 4 | \n",
" 33922579 | \n",
" Aaron Gate | \n",
" NZL | \n",
" male | \n",
" 1990-11-26 | \n",
" 1.81 | \n",
" 71.0 | \n",
" cycling | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" id name nationality sex date_of_birth height \\\n",
"0 736041664 A Jesus Garcia ESP male 1969-10-17 1.72 \n",
"1 532037425 A Lam Shin KOR female 1986-09-23 1.68 \n",
"2 435962603 Aaron Brown CAN male 1992-05-27 1.98 \n",
"3 521041435 Aaron Cook MDA male 1991-01-02 1.83 \n",
"4 33922579 Aaron Gate NZL male 1990-11-26 1.81 \n",
"\n",
" weight sport gold silver bronze info \n",
"0 64.0 athletics 0 0 0 NaN \n",
"1 56.0 fencing 0 0 0 NaN \n",
"2 79.0 athletics 0 0 1 NaN \n",
"3 80.0 taekwondo 0 0 0 NaN \n",
"4 71.0 cycling 0 0 0 NaN "
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"athletes.head(5)"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"sport sex \n",
"aquatics female 62.284483\n",
" male 82.219061\n",
"archery female 64.301587\n",
" male 80.079365\n",
"athletics female 60.152542\n",
" male 74.777680\n",
"badminton female 61.209877\n",
" male 76.156627\n",
"basketball female 75.377622\n",
" male 100.297872\n",
"boxing female NaN\n",
" male NaN\n",
"canoe female 66.457944\n",
" male 82.150000\n",
"cycling female 60.207254\n",
" male 72.576052\n",
"equestrian female 58.634146\n",
" male 72.954887\n",
"fencing female 62.733871\n",
" male 78.785124\n",
"football female 61.061069\n",
" male 74.451713\n",
"golf female 63.200000\n",
" male 79.000000\n",
"gymnastics female 49.555024\n",
" male 63.254545\n",
"handball female 70.789157\n",
" male 95.431694\n",
"hockey female 60.425926\n",
" male 77.375000\n",
"judo female 65.392157\n",
" male 84.616740\n",
"modern pentathlon female 58.000000\n",
" male 73.916667\n",
"rowing female 69.776190\n",
" male 86.504615\n",
"rugby sevens female 66.595890\n",
" male 90.450331\n",
"sailing female 63.194969\n",
" male 77.122066\n",
"shooting female 62.675676\n",
" male 81.068966\n",
"table tennis female 57.530120\n",
" male 72.558140\n",
"taekwondo female 61.258065\n",
" male 74.809524\n",
"tennis female 64.670455\n",
" male 80.417476\n",
"triathlon female 54.563636\n",
" male 66.814815\n",
"volleyball female 70.684211\n",
" male 89.421875\n",
"weightlifting female 68.788462\n",
" male 87.538961\n",
"wrestling female 61.805310\n",
" male 85.372881\n",
"Name: weight, dtype: float64"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"weight_mean_by_sport_and_sex = athletes.groupby(['sport', 'sex'])['weight'].mean()\n",
"weight_mean_by_sport_and_sex"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" sex | \n",
" female | \n",
" male | \n",
"
\n",
" \n",
" sport | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" aquatics | \n",
" 62.284483 | \n",
" 82.219061 | \n",
"
\n",
" \n",
" archery | \n",
" 64.301587 | \n",
" 80.079365 | \n",
"
\n",
" \n",
" athletics | \n",
" 60.152542 | \n",
" 74.777680 | \n",
"
\n",
" \n",
" badminton | \n",
" 61.209877 | \n",
" 76.156627 | \n",
"
\n",
" \n",
" basketball | \n",
" 75.377622 | \n",
" 100.297872 | \n",
"
\n",
" \n",
" boxing | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" canoe | \n",
" 66.457944 | \n",
" 82.150000 | \n",
"
\n",
" \n",
" cycling | \n",
" 60.207254 | \n",
" 72.576052 | \n",
"
\n",
" \n",
" equestrian | \n",
" 58.634146 | \n",
" 72.954887 | \n",
"
\n",
" \n",
" fencing | \n",
" 62.733871 | \n",
" 78.785124 | \n",
"
\n",
" \n",
" football | \n",
" 61.061069 | \n",
" 74.451713 | \n",
"
\n",
" \n",
" golf | \n",
" 63.200000 | \n",
" 79.000000 | \n",
"
\n",
" \n",
" gymnastics | \n",
" 49.555024 | \n",
" 63.254545 | \n",
"
\n",
" \n",
" handball | \n",
" 70.789157 | \n",
" 95.431694 | \n",
"
\n",
" \n",
" hockey | \n",
" 60.425926 | \n",
" 77.375000 | \n",
"
\n",
" \n",
" judo | \n",
" 65.392157 | \n",
" 84.616740 | \n",
"
\n",
" \n",
" modern pentathlon | \n",
" 58.000000 | \n",
" 73.916667 | \n",
"
\n",
" \n",
" rowing | \n",
" 69.776190 | \n",
" 86.504615 | \n",
"
\n",
" \n",
" rugby sevens | \n",
" 66.595890 | \n",
" 90.450331 | \n",
"
\n",
" \n",
" sailing | \n",
" 63.194969 | \n",
" 77.122066 | \n",
"
\n",
" \n",
" shooting | \n",
" 62.675676 | \n",
" 81.068966 | \n",
"
\n",
" \n",
" table tennis | \n",
" 57.530120 | \n",
" 72.558140 | \n",
"
\n",
" \n",
" taekwondo | \n",
" 61.258065 | \n",
" 74.809524 | \n",
"
\n",
" \n",
" tennis | \n",
" 64.670455 | \n",
" 80.417476 | \n",
"
\n",
" \n",
" triathlon | \n",
" 54.563636 | \n",
" 66.814815 | \n",
"
\n",
" \n",
" volleyball | \n",
" 70.684211 | \n",
" 89.421875 | \n",
"
\n",
" \n",
" weightlifting | \n",
" 68.788462 | \n",
" 87.538961 | \n",
"
\n",
" \n",
" wrestling | \n",
" 61.805310 | \n",
" 85.372881 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
"sex female male\n",
"sport \n",
"aquatics 62.284483 82.219061\n",
"archery 64.301587 80.079365\n",
"athletics 60.152542 74.777680\n",
"badminton 61.209877 76.156627\n",
"basketball 75.377622 100.297872\n",
"boxing NaN NaN\n",
"canoe 66.457944 82.150000\n",
"cycling 60.207254 72.576052\n",
"equestrian 58.634146 72.954887\n",
"fencing 62.733871 78.785124\n",
"football 61.061069 74.451713\n",
"golf 63.200000 79.000000\n",
"gymnastics 49.555024 63.254545\n",
"handball 70.789157 95.431694\n",
"hockey 60.425926 77.375000\n",
"judo 65.392157 84.616740\n",
"modern pentathlon 58.000000 73.916667\n",
"rowing 69.776190 86.504615\n",
"rugby sevens 66.595890 90.450331\n",
"sailing 63.194969 77.122066\n",
"shooting 62.675676 81.068966\n",
"table tennis 57.530120 72.558140\n",
"taekwondo 61.258065 74.809524\n",
"tennis 64.670455 80.417476\n",
"triathlon 54.563636 66.814815\n",
"volleyball 70.684211 89.421875\n",
"weightlifting 68.788462 87.538961\n",
"wrestling 61.805310 85.372881"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Mueve cada valor del ultimo nivel de un indice mutinivel a un columna\n",
"weight_mean_by_sport_and_sex.unstack()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Pivot\n",
"\n",
"\n",
"\n",
"__[pandas.DataFrame.pivot](https://pandas.pydata.org/pandas-docs/version/0.22.0/generated/pandas.DataFrame.pivot.html)__"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" id | \n",
" item | \n",
" value | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 823905 | \n",
" prize | \n",
" 3.49 | \n",
"
\n",
" \n",
" 1 | \n",
" 823905 | \n",
" unit | \n",
" kg | \n",
"
\n",
" \n",
" 2 | \n",
" 235897 | \n",
" prize | \n",
" 12.89 | \n",
"
\n",
" \n",
" 3 | \n",
" 235897 | \n",
" unit | \n",
" l | \n",
"
\n",
" \n",
" 4 | \n",
" 235897 | \n",
" stock | \n",
" 50 | \n",
"
\n",
" \n",
" 5 | \n",
" 983422 | \n",
" prize | \n",
" 0.49 | \n",
"
\n",
" \n",
" 6 | \n",
" 983422 | \n",
" stock | \n",
" 4 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" id item value\n",
"0 823905 prize 3.49\n",
"1 823905 unit kg\n",
"2 235897 prize 12.89\n",
"3 235897 unit l\n",
"4 235897 stock 50\n",
"5 983422 prize 0.49\n",
"6 983422 stock 4"
]
},
"execution_count": 56,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#El ID se repite para cada uno de las propiedaded del producto :/\n",
"products = pd.DataFrame({'id': [823905, 823905,\n",
" 235897, 235897, 235897,\n",
" 983422, 983422],\n",
" 'item': ['prize', 'unit', \n",
" 'prize', 'unit', 'stock', \n",
" 'prize', 'stock'],\n",
" 'value': [3.49, 'kg',\n",
" 12.89, 'l', 50,\n",
" 0.49, 4]})\n",
"products"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pivot.html\n",
"\n",
"De manera similar a unstack, sirve para mover fila como columnas\n",
"y así construir un DF con más columnas y menos filas\n",
"\n",
"Usar id como el indice de cada fila y\n",
"los valores en la columna 'item' para crear columnas.\n",
"Los valores que debe aparecer en cada columnas esta en la columna 'value"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" item | \n",
" prize | \n",
" stock | \n",
" unit | \n",
"
\n",
" \n",
" id | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 235897 | \n",
" 12.89 | \n",
" 50 | \n",
" l | \n",
"
\n",
" \n",
" 823905 | \n",
" 3.49 | \n",
" NaN | \n",
" kg | \n",
"
\n",
" \n",
" 983422 | \n",
" 0.49 | \n",
" 4 | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
"item prize stock unit\n",
"id \n",
"235897 12.89 50 l\n",
"823905 3.49 NaN kg\n",
"983422 0.49 4 NaN"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"products.pivot(index='id', columns='item', values='value')"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" date | \n",
" symbol | \n",
" open | \n",
" high | \n",
" low | \n",
" close | \n",
" volume | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2019-03-01 | \n",
" AMZN | \n",
" 1655.13 | \n",
" 1674.26 | \n",
" 1651.00 | \n",
" 1671.73 | \n",
" 4974877 | \n",
"
\n",
" \n",
" 1 | \n",
" 2019-03-04 | \n",
" AMZN | \n",
" 1685.00 | \n",
" 1709.43 | \n",
" 1674.36 | \n",
" 1696.17 | \n",
" 6167358 | \n",
"
\n",
" \n",
" 2 | \n",
" 2019-03-05 | \n",
" AMZN | \n",
" 1702.95 | \n",
" 1707.80 | \n",
" 1689.01 | \n",
" 1692.43 | \n",
" 3681522 | \n",
"
\n",
" \n",
" 3 | \n",
" 2019-03-06 | \n",
" AMZN | \n",
" 1695.97 | \n",
" 1697.75 | \n",
" 1668.28 | \n",
" 1668.95 | \n",
" 3996001 | \n",
"
\n",
" \n",
" 4 | \n",
" 2019-03-07 | \n",
" AMZN | \n",
" 1667.37 | \n",
" 1669.75 | \n",
" 1620.51 | \n",
" 1625.95 | \n",
" 4957017 | \n",
"
\n",
" \n",
" 5 | \n",
" 2019-03-01 | \n",
" AAPL | \n",
" 174.28 | \n",
" 175.15 | \n",
" 172.89 | \n",
" 174.97 | \n",
" 25886167 | \n",
"
\n",
" \n",
" 6 | \n",
" 2019-03-04 | \n",
" AAPL | \n",
" 175.69 | \n",
" 177.75 | \n",
" 173.97 | \n",
" 175.85 | \n",
" 27436203 | \n",
"
\n",
" \n",
" 7 | \n",
" 2019-03-05 | \n",
" AAPL | \n",
" 175.94 | \n",
" 176.00 | \n",
" 174.54 | \n",
" 175.53 | \n",
" 19737419 | \n",
"
\n",
" \n",
" 8 | \n",
" 2019-03-06 | \n",
" AAPL | \n",
" 174.67 | \n",
" 175.49 | \n",
" 173.94 | \n",
" 174.52 | \n",
" 20810384 | \n",
"
\n",
" \n",
" 9 | \n",
" 2019-03-07 | \n",
" AAPL | \n",
" 173.87 | \n",
" 174.44 | \n",
" 172.02 | \n",
" 172.50 | \n",
" 24796374 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" date symbol open high low close volume\n",
"0 2019-03-01 AMZN 1655.13 1674.26 1651.00 1671.73 4974877\n",
"1 2019-03-04 AMZN 1685.00 1709.43 1674.36 1696.17 6167358\n",
"2 2019-03-05 AMZN 1702.95 1707.80 1689.01 1692.43 3681522\n",
"3 2019-03-06 AMZN 1695.97 1697.75 1668.28 1668.95 3996001\n",
"4 2019-03-07 AMZN 1667.37 1669.75 1620.51 1625.95 4957017\n",
"5 2019-03-01 AAPL 174.28 175.15 172.89 174.97 25886167\n",
"6 2019-03-04 AAPL 175.69 177.75 173.97 175.85 27436203\n",
"7 2019-03-05 AAPL 175.94 176.00 174.54 175.53 19737419\n",
"8 2019-03-06 AAPL 174.67 175.49 173.94 174.52 20810384\n",
"9 2019-03-07 AAPL 173.87 174.44 172.02 172.50 24796374"
]
},
"execution_count": 61,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"stocks = pd.read_csv('https://gist.githubusercontent.com/alexdebrie/b3f40efc3dd7664df5a20f5eee85e854/raw/ee3e6feccba2464cbbc2e185fb17961c53d2a7f5/stocks.csv')\n",
"stocks.head(10)"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" date | \n",
" 2019-03-01 | \n",
" 2019-03-04 | \n",
" 2019-03-05 | \n",
" 2019-03-06 | \n",
" 2019-03-07 | \n",
"
\n",
" \n",
" symbol | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" AAPL | \n",
" 25886167 | \n",
" 27436203 | \n",
" 19737419 | \n",
" 20810384 | \n",
" 24796374 | \n",
"
\n",
" \n",
" AMZN | \n",
" 4974877 | \n",
" 6167358 | \n",
" 3681522 | \n",
" 3996001 | \n",
" 4957017 | \n",
"
\n",
" \n",
" GOOG | \n",
" 1450316 | \n",
" 1446047 | \n",
" 1443174 | \n",
" 1099289 | \n",
" 1166559 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
"date 2019-03-01 2019-03-04 2019-03-05 2019-03-06 2019-03-07\n",
"symbol \n",
"AAPL 25886167 27436203 19737419 20810384 24796374\n",
"AMZN 4974877 6167358 3681522 3996001 4957017\n",
"GOOG 1450316 1446047 1443174 1099289 1166559"
]
},
"execution_count": 62,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"stocks.pivot(index='symbol', columns='date', values='volume')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Melt\n",
"\n",
"\n",
"\n",
"__[pandas.DataFrame.melt](https://pandas.pydata.org/pandas-docs/version/0.22.0/generated/pandas.DataFrame.melt.html)__"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`melt()` hace lo opuesto a `pivot()`"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" index | \n",
" test_1 | \n",
" test_2 | \n",
" test_3 | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Mary | \n",
" 6 | \n",
" 4 | \n",
" 5 | \n",
"
\n",
" \n",
" 1 | \n",
" John | \n",
" 7 | \n",
" 8 | \n",
" 7 | \n",
"
\n",
" \n",
" 2 | \n",
" Ann | \n",
" 6 | \n",
" 7 | \n",
" 9 | \n",
"
\n",
" \n",
" 3 | \n",
" Pete | \n",
" 6 | \n",
" 5 | \n",
" 5 | \n",
"
\n",
" \n",
" 4 | \n",
" Laura | \n",
" 5 | \n",
" 2 | \n",
" 7 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" index test_1 test_2 test_3\n",
"0 Mary 6 4 5\n",
"1 John 7 8 7\n",
"2 Ann 6 7 9\n",
"3 Pete 6 5 5\n",
"4 Laura 5 2 7"
]
},
"execution_count": 58,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"grades = pd.DataFrame([[6, 4, 5], [7, 8, 7], [6, 7, 9], [6, 5, 5], [5, 2, 7]], \n",
" index = ['Mary', 'John', 'Ann', 'Pete', 'Laura'],\n",
" columns = ['test_1', 'test_2', 'test_3'])\n",
"grades.reset_index(inplace=True)\n",
"grades"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" index | \n",
" variable | \n",
" value | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Mary | \n",
" test_1 | \n",
" 6 | \n",
"
\n",
" \n",
" 1 | \n",
" John | \n",
" test_1 | \n",
" 7 | \n",
"
\n",
" \n",
" 2 | \n",
" Ann | \n",
" test_1 | \n",
" 6 | \n",
"
\n",
" \n",
" 3 | \n",
" Pete | \n",
" test_1 | \n",
" 6 | \n",
"
\n",
" \n",
" 4 | \n",
" Laura | \n",
" test_1 | \n",
" 5 | \n",
"
\n",
" \n",
" 5 | \n",
" Mary | \n",
" test_2 | \n",
" 4 | \n",
"
\n",
" \n",
" 6 | \n",
" John | \n",
" test_2 | \n",
" 8 | \n",
"
\n",
" \n",
" 7 | \n",
" Ann | \n",
" test_2 | \n",
" 7 | \n",
"
\n",
" \n",
" 8 | \n",
" Pete | \n",
" test_2 | \n",
" 5 | \n",
"
\n",
" \n",
" 9 | \n",
" Laura | \n",
" test_2 | \n",
" 2 | \n",
"
\n",
" \n",
" 10 | \n",
" Mary | \n",
" test_3 | \n",
" 5 | \n",
"
\n",
" \n",
" 11 | \n",
" John | \n",
" test_3 | \n",
" 7 | \n",
"
\n",
" \n",
" 12 | \n",
" Ann | \n",
" test_3 | \n",
" 9 | \n",
"
\n",
" \n",
" 13 | \n",
" Pete | \n",
" test_3 | \n",
" 5 | \n",
"
\n",
" \n",
" 14 | \n",
" Laura | \n",
" test_3 | \n",
" 7 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" index variable value\n",
"0 Mary test_1 6\n",
"1 John test_1 7\n",
"2 Ann test_1 6\n",
"3 Pete test_1 6\n",
"4 Laura test_1 5\n",
"5 Mary test_2 4\n",
"6 John test_2 8\n",
"7 Ann test_2 7\n",
"8 Pete test_2 5\n",
"9 Laura test_2 2\n",
"10 Mary test_3 5\n",
"11 John test_3 7\n",
"12 Ann test_3 9\n",
"13 Pete test_3 5\n",
"14 Laura test_3 7"
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"grades.melt(id_vars=['index']) # indicar las columna que identifican a cada entidad"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.7"
}
},
"nbformat": 4,
"nbformat_minor": 2
}